Problem Set 06

Author

Mansi Patel, mdp86

Published

2025-10-04

1

https://pubmed.ncbi.nlm.nih.gov/39392750/ Fig 4, panel G and N these figures has y axis structure that can easily me misinterpreted. the difference beween the two conditions look bigger than it actually is. while it might me justifiable, i think it should the actually poings for the daf-2 should have been not too close to 0 because it makes it look like it is 0, but it probably isn’t.

n g

2

library(palmerpenguins)

Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':

    penguins, penguins_raw
library(ggplot2)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.3.0
✔ purrr     1.1.0     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)

# ggplot2 extensions
library(ggrepel)
library(GGally)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
# packages requiring extra installation steps
library(ggmagnify)

# data
library(palmerpenguins)
data <- penguins

plot <- 
  ggplot(data = data,
       aes(x = flipper_length_mm, 
           y = bill_length_mm)) +
  geom_point(na.rm = TRUE, alpha = 0.5) + 
  labs(x = "Flipper length (mm)",
       y = "Bill length (mm)",
       title = "Data source: Palmer penguins") +
  theme_bw()

ggplotly(plot) #flipper legnth 181, bill length 58 is the outlier point
ggplot(data = data,
       aes(x = flipper_length_mm, 
           y = bill_length_mm)) +
  geom_point(na.rm = TRUE, alpha = 0.5, size = 1.5) + 
 geom_point(aes(x = 180.5, y = 58), 
            color = "red", size = 3) + 
  annotate("segment", 
           x = 190, y = 60, 
           xend = 181, yend = 58,
           arrow = arrow(length = unit(0.3, "cm"), 
                         type = "open")) + 
  annotate("text",
           x = 195,
           y = 60,
           label = "Outlier?") + #looked up the syntax for annotate and the segment and text features 
  
  labs(x = "Flipper length (mm)",
       y = "Bill length (mm)",
       title = "Data source: Palmer penguins") +
  theme_bw()
Warning in geom_point(aes(x = 180.5, y = 58), color = "red", size = 3): All aesthetics have length 1, but the data has 344 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

data_for_heatmap <-
  read_csv("nytimes-covid-data-us-states_2022-02.csv")
Rows: 39094 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): geoid, state
dbl  (6): cases, cases_avg, cases_avg_per_100k, deaths, deaths_avg, deaths_a...
date (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
display.brewer.all()

color.scheme <- brewer.pal(9,"Reds")

 # not states: Puerto Rico, Virgin Islands, Guam, Northern Mariana Islands, American Samoa

data_with_states_only <-
  data_for_heatmap |> 
  filter(!state %in% c("Puerto Rico", "Virgin Islands", "Guam", "Northern Mariana Islands", "American Samoa", "Hawaii"))
unique(data_with_states_only$state)
 [1] "Washington"           "Illinois"             "California"          
 [4] "Arizona"              "Massachusetts"        "Wisconsin"           
 [7] "Texas"                "Nebraska"             "Utah"                
[10] "Oregon"               "Rhode Island"         "New York"            
[13] "Florida"              "New Hampshire"        "Georgia"             
[16] "North Carolina"       "New Jersey"           "Tennessee"           
[19] "Nevada"               "Maryland"             "Colorado"            
[22] "South Carolina"       "Pennsylvania"         "Oklahoma"            
[25] "Minnesota"            "Kentucky"             "Indiana"             
[28] "Virginia"             "Vermont"              "Missouri"            
[31] "Kansas"               "District of Columbia" "Iowa"                
[34] "Connecticut"          "Ohio"                 "Louisiana"           
[37] "South Dakota"         "Michigan"             "Wyoming"             
[40] "North Dakota"         "New Mexico"           "Mississippi"         
[43] "Delaware"             "Arkansas"             "Maine"               
[46] "Alaska"               "Montana"              "Idaho"               
[49] "Alabama"              "West Virginia"       
data_with_states_only |>
  ggplot(aes(x = date, y = state, fill = deaths_avg_per_100k)) + 
  geom_raster() +
  scale_fill_gradientn(colors = color.scheme,
                       limits = c(0, 5))

data lunch

This weeks data lunch session was about the final project involving finding and recreating an image from a published paper with available dataset